Mining Textual Data in Croatian

نویسندگان

  • Bojana Dalbelo Basic
  • Boris Berecek
  • Ana Cvitas
چکیده

Business intelligence systems find textual data a very useful source of information. Text processing algorithms and systems in English and other world languages are well developed, which is not the case with Croatian language. This paper explores the applicability of existing systems and examines optimal parameters for Croatian. The quality of input data strongly influences clustering and classification results. Experiments are significantly better run after reducing noise. The impact of input learning set size and dimensionality are also considered. Special preprocessing for Croatian language consists of morphological normalisation, a useful step towards better results. Non-croatian specialised text mining tools are also applicable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The automatic creation of concept maps from documents written using morphologically rich languages

A concept map is a graphical tool for representing knowledge. They have been used in many different areas, including education, knowledge management, business and intelligence. Manually constructing concept maps can be a complex task; the unskilled person may encounter difficulties in determining and positioning concepts relevant to the problem area. An application that recommends concept candi...

متن کامل

Automatic creation of a concept map

Concept map is a graphical technique for representing knowledge, successfully used in different areas, including education, knowledge management, business and intelligence. In this paper, an overview of different approaches to automatic creation of concept maps from textual and non-textual sources is given. Concept map mining process is defined, and one method for creation of concept maps from ...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Towards Personalized Maps: Mining User Preferences from Geo-textual Data

Rich geo-textual data is available online and the data keeps increasing at a high speed. We propose two user behavior models to learn several types of user preferences from geo-textual data, and a prototype system on top of the user preference models for mining and search geo-textual data (called PreMiner) to support personalized maps. Different from existing recommender systems and data analys...

متن کامل

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005